A hybrid approach to Urdu verb phrase chunking
نویسندگان
چکیده
A variety of verb phrases exist in Urdu including simple verb phrases, conjunct verb phrases and compound verb phrases. This paper explains the structure of Urdu verb phrases, and details a series of experiment to automatically tag them. Initially, a rule based model is developed using 21 linguistic rules for automatic VP chunking. A 100,000 word Urdu corpus is manually tagged with VP chunk tags. The corpus is then used to develop a hybrid approach using HMM based statistical chunking and correction rules. The technique is enhanced by changing chunking direction and merging chunk and POS tags. The automatically chunked data is compared with manually tagged held-out data to identify and analyze the errors. Based on the analysis, correction rules are extracted to address the errors. By applying these rules after statistical tagging, further improvement is achieved in chunking accuracy. The results of all experiments are reported with maximum overall accuracy of 98.44% achieved using hybrid approach with extended tagset.
منابع مشابه
NP Subject Detection in Verb-Initial Arabic Clauses
Phrase re-ordering is a well-known obstacle to robust machine translation for language pairs with significantly different word orderings. For Arabic-English, two languages that usually differ in the ordering of subject and verb, the subject and its modifiers must be accurately moved to produce a grammatical translation. This operation requires more than base phrase chunking and often defies cur...
متن کاملA Hybrid Approach to Chinese Base Noun Phrase Chunking
In this paper, we propose a hybrid approach to chunking Chinese base noun phrases (base NPs), which combines SVM (Support Vector Machine) model and CRF (Conditional Random Field) model. In order to compare the result respectively from two chunkers, we use the discriminative post-processing method, whose measure criterion is the conditional probability generated from the CRF chunker. With respec...
متن کاملSemantic Priming Effect on Relative Clause Attachment Ambiguity Resolution in L2
This study examined whether processing ambiguous sentences containing relative clauses (RCs) following a complex determiner phrase (DP) by Persian-speaking learners of L2 English with different proficiency and working memory capacities (WMCs) is affected by semantic priming. The semantic relationship studied was one between the subject/verb of the main clause and one of the DPs in the complex D...
متن کاملNew Phrase Chunking Algorithm for Myanmar Natural Language Processing
Chunking is the subdivision of sentences into non recursive regular syntactical groups: verbal chunks, nominal chunks, adjective chunks, adverbial chunks and propositional chunks etc. The chunker can operate as a preprocessor for Natural Language Processing systems. This study aims to propose new phrase chunking algorithm for Myanmar natural language processing. The developed new algorithm acce...
متن کاملInfluence of Conditional Independence Assumption on Verb Subcategorization Detection
Learning Bayesian Belief Networks from corpora has been applied to the automatic acquisition of verb subcategorization frames for Modern Greek (MG). We are incorporating minimal linguistic resources, i.e morphological tagging and phrase chunking, since a general-purpose syntactic parser for MG is currently unavailable. Comparative experimental results have been evaluated against Naive Bayes cla...
متن کامل